Dependency-structure Annotation to Corpus of Spontaneous Japanese

نویسندگان

  • Kiyotaka Uchimoto
  • Ryoji Hamabe
  • Takehiko Maruyama
  • Katsuya Takanashi
  • Tatsuya Kawahara
  • Hitoshi Isahara
چکیده

In Japanese, syntactic structure of a sentence is generally represented by the relationship between phrasal units, or bunsetsus in Japanese, based on a dependency grammar. In the same way, the syntactic structure of a sentence in a large, spontaneous, Japanese-speech corpus, the Corpus of Spontaneous Japanese (CSJ), is represented by dependency relationships between bunsetsus. This paper describes the criteria and definitions of dependency relationships between bunsetsus in the CSJ. The dependency structure of the CSJ is investigated, and the difference in the dependency structures of written text and spontaneous speech is discussed in terms of the dependency accuracies obtained by using a corpus-based model. It is shown that the accuracy of automatic dependency-structure analysis can be improved if characteristic phenomena of spontaneous speech — such as self-corrections, basic utterance units in spontaneous speech, and bunsetsus that have no modifiee — are detected and used for dependency-structure analysis.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Word-level Dependency-structure Annotation to Corpus of Spontaneous Japanese and its Application

In Japanese, the syntactic structure of a sentence is generally represented by the relationship between phrasal units, bunsetsus in Japanese, based on a dependency grammar. In many cases, the syntactic structure of a bunsetsu is not considered in syntactic structure annotation. This paper gives the criteria and definitions of dependency relationships between words in a bunsetsu and their applic...

متن کامل

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

Dependency structure analysis and sentence boundary detection in spontaneous Japanese

This paper addresses automatic detection of dependencies between Japanese phrasal units called bunsetsus, and sentence boundaries in a spontaneous speech corpus. In spontaneous speech, the biggest problem with dependency structure analysis is that sentence boundaries are ambiguous. In this paper, we propose two methods for improving the accuracy of sentence boundary detection in spontaneous Jap...

متن کامل

A Japanese Word Dependency Corpus

In this paper, we present a corpus annotated with dependency relationships in Japanese. It contains about 30 thousand sentences in various domains. Six domains in Balanced Corpus of Contemporary Written Japanese have part-of-speech and pronunciation annotation as well. Dictionary example sentences have pronunciation annotation and cover basic vocabulary in Japanese with English sentence equival...

متن کامل

Stochastic Dependency Parsing of Spontaneous Japanese Spoken Language

This paper describes the characteristic features of dependency structures of Japanese spoken language by investigating a spoken dialogue corpus, and proposes a stochastic approach to dependency parsing. The method can robustly cope with inversion phenomena and bunsetsus which don’t have the head bunsetsu by relaxing the syntactic dependency constraints. The method acquires in advance the probab...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006